Introduction

Introduction to study can be found in the Deseq file. WCGNA analysis incorporates trait data to determine what gene groupings are correlated with different traits. For our data, this allows us to check that differential gene expression is due to the different treatments and not due to differences in cell density or growth rate.

R version

We used R version R-4.0.3 (Package versions listed below)

library(WGCNA) #Version 1.70-3
library(impute) #Version 1.64.0
library(flashClust) #Version 1.01-2
library(DESeq2) #Version  1.30.1
library(pheatmap) #Version 1.0.12

Data preparation

First, we must rename and filter our data to get the best results.

We read in our count data file that was created from the raw sequences. Row names were changed to match the row names of the GO term files to allow proper GO enrichment analysis later.

countData <- read.table("B_psygmophilum_counts.txt")
length(countData[,1])
## [1] 28265
names(countData)=c( "Control_1.1", "Control_1", "Control_2.1", "Control_2", "Control_3.1", "Control_3", "Control_4.1", "Control_4", "Cool_1", "Cool_2", "Cool_3", "Cool_4","Heat_1.1", "Heat_1", "Heat_2.1", "Heat_2", "Heat_3.1", "Heat_3", "Heat_4.1", "Heat_4")
row.names(countData)=sub("", "isogroup", rownames(countData))

treat=c( "Control", "Control", "Control", "Control", "Control", "Control", "Control", "Control", "Cool", "Cool", "Cool", "Cool","Heat", "Heat", "Heat", "Heat", "Heat", "Heat", "Heat", "Heat")
g=data.frame(treat)
colData<- g

We run the same DeSeq object as before but we filter out trash counts

We only kept contigs that with a baseMean <3 which gets rid of low counts that would be annoying

dds<-DESeqDataSetFromMatrix(countData=countData, colData=colData, design=~ treat) 
dds<-DESeq(dds)
res<- results(dds)
res3<-res[res$baseMean>3, ]
dim(res) #28265
## [1] 28265     6
dim(res3) #15962
## [1] 15962     6

This filtered out about 12,000 contigs.

Make rlog data

We run an rlog transformation which is a normalization method important for later downstream analysis.

rld <- rlogTransformation(dds, blind=TRUE, fitType="local")
rld_wg=(assay(rld))
nrow(rld_wg)
## [1] 28265

The above object still has the trash counts so we filter again as we did above.

rldFiltered=(assay(rld))[(rownames((assay(rld))) %in% rownames(res3)),]
nrow(rldFiltered)
## [1] 15962

This data then gets written into a csv file so we can use the data in the WGCNA analysis

write.csv( rldFiltered,file="Bpsy_wgcna_allgenes.csv",quote=F,row.names=T)

WGCNA analysis

Data input and cleaning

We read in the data file created above and rename the rows based on the isogroup names.

options(stringsAsFactors=FALSE)
allowWGCNAThreads()
## Allowing multi-threading with up to 28 threads.
dat=read.csv("Bpsy_wgcna_allgenes.csv")
rownames(dat)<-dat$X
dat$X=NULL
names(dat)
##  [1] "Control_1.1" "Control_1"   "Control_2.1" "Control_2"   "Control_3.1"
##  [6] "Control_3"   "Control_4.1" "Control_4"   "Cool_1"      "Cool_2"     
## [11] "Cool_3"      "Cool_4"      "Heat_1.1"    "Heat_1"      "Heat_2.1"   
## [16] "Heat_2"      "Heat_3.1"    "Heat_3"      "Heat_4.1"    "Heat_4"
nrow(dat)
## [1] 15962

Now we create a data frame to check for outliers. An output of TRUE indicates that there are no outlier genes.

datExpr0 = as.data.frame(t(dat))
gsg = goodSamplesGenes(datExpr0, verbose = 3);
##  Flagging genes and samples with too many missing values...
##   ..step 1
gsg$allOK
## [1] TRUE

We found no outliers!

Now we read in the data file containing trait information. Traits are average cell count, average cells per mL, average cell growth rate, and thermal stress treatment.

traitData= read.csv("Bpsy_Traits_Num.csv", row.names=1)
names(traitData)
## [1] "Avg_Cell_Count"  "Avg_Cell_Per_mL" "Growth_Rate"     "Control"        
## [5] "Cool"            "Heat"

This makes sure that the row names of our count data and the row names of our trait data are the same to ensure proper alignment of the datasets. This is required for proper anazlying of count data in terms of the traits. If the code returns TRUE, the files are properly aligned.

rownames(traitData)=rownames(datExpr0)
traitData$Sample= NULL 
datTraits=traitData

table(rownames(datTraits)==rownames(datExpr0))
## 
## TRUE 
##   20

Now we create a dendogram and trait heat map showing how linked the gene expression in each sample is. We calculate whole network connectivity and choose signed as we are interested in the direction of gene expression

A=adjacency(t(datExpr0),type="signed") 
k=as.numeric(apply(A,2,sum))-1
Z.k=scale(k)
thresholdZ.k=-2.5 
outlierColor=ifelse(Z.k<thresholdZ.k,"red","black")
sampleTree = flashClust(as.dist(1-A), method = "average")
traitColors=data.frame(numbers2colors(datTraits,signed=FALSE))
dimnames(traitColors)[[2]]=paste(names(datTraits))
datColors=data.frame(outlierC=outlierColor,traitColors)

plotDendroAndColors(sampleTree,groupLabels=names(datColors), colors=datColors,main="Sample dendrogram and trait heatmap")

Figure 2.1 Sample dendrogram and trait heatmap. The top half of the graph shows a dendrogram illustrating how related each sample is. The heatmap of the traits on the bottome half of the graph combined with the dendrogram show that samples are grouped most closely by what treatment they were exposed to.

we save this because it can take a lot to actually make this with large dataset

save(datExpr0, datTraits, file="Bpsy_Samples_Traits_ALL.RData")

Network construction and module detection

First we figure out the proper soft-thresholding power

options(stringsAsFactors = FALSE)
allowWGCNAThreads() 
lnames = load(file="Bpsy_Samples_Traits_ALL.RData")

powers = c(seq(1,14,by=2), seq(15,30, by=0.5));
sft = pickSoftThreshold(datExpr0, powerVector = powers, networkType="signed", verbose = 2) 
par(mfrow = c(1,2));
cex1 = 0.9;

plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
     xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",ylim=c(0,1),
     main = paste("Scale independence"));
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
     labels=powers,cex=cex1,col="red");
abline(h=0.90,col="red")
plot(sft$fitIndices[,1], sft$fitIndices[,5],
     xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
     main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")

Figure 2.2 Plots of scale free topology or mean connectivity versus soft threshold. Based on when both graphs start to plateau, a soft threshold power of 9 was chosen for the downstream analysis.

Calculating similarity

After choosing a soft power, we calculate adjacency which can be translated into topological overlap matrix which is a separate measure of similarity.

softPower=9
adjacency=adjacency(datExpr0, power=softPower,type="signed")
TOM= TOMsimilarity(adjacency,TOMType = "signed")
dissTOM= 1-TOM

Figure 2.3 Gene clustering based on TOM dissimilarity. Based on the TOM calculations done above, genes are clustered based on their similarity which is displayed in this dendrogram. Each leaf is a gene and branches that are grouped together densely are interconnected, highly co-expressed genes.

Module creation

Now that we have dendrogram and TOM dissimilarity, we can use these to define modules

We choose only large modules. We use a hybrid method of dynamic tree cutting that uses both the dendrogram and TOM dissimilairty as inputs to cut and merge modules.

minModuleSize=90
dynamicMods= cutreeDynamic(dendro= geneTree, distM= dissTOM, deepSplit=2, pamRespectsDendro= FALSE, minClusterSize= minModuleSize)
##  ..cutHeight not given, setting it to 0.988  ===>  99% of the (truncated) height range in dendro.
##  ..done.
table(dynamicMods)
## dynamicMods
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14 
## 3940 1974 1781 1604 1139 1047  955  920  784  744  458  218  207  191

These modules are assigned colors and graphed underneath the dendrogram.

dynamicColors= labels2colors(dynamicMods)
plotDendroAndColors(geneTree, dynamicColors, "Dynamic Tree Cut", dendroLabels= FALSE, hang=0.03, addGuide= TRUE, guideHang= 0.05, main= "Gene dendrogram and module colors")

Figure 2.4 Dendrogram graphed with moduled membership. After dynamic tree cutting, the gene dendrogram is graphed with the module membership graphed underneath.

We currently have a lot of modules that are likely closely related. Therefore, we want to merge together these modules of similar gene expression profiles to get the best correlations with the different traits. To do this we calculated eigengenes and then calculated the similarity between them. Modules are merged based on a cut value of 0.4 and the new larger modules are saved for additional analysis.

MEList= moduleEigengenes(datExpr0, colors= dynamicColors,softPower = 9)
MEs= MEList$eigengenes

MEDiss= 1-cor(MEs)

METree= flashClust(as.dist(MEDiss), method= "average")

save(dynamicMods, MEList, MEs, MEDiss, METree, file= "Network_bpsy_nomerge.RData")

lnames = load(file = "Network_bpsy_nomerge.RData")

plot(METree, main= "Clustering of module eigengenes", xlab= "", sub= "")
MEDissThres= 0.4
abline(h=MEDissThres, col="red")

Figure 2.5 Dendrogram of modules with cut off point. Modules below the cut off value were merged together. These modules were closely related enough to be analyzed together as one.

merge= mergeCloseModules(datExpr0, dynamicColors, cutHeight= MEDissThres, verbose =3)

mergedColors= merge$colors
mergedMEs= merge$newMEs

plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors), c("Dynamic Tree Cut", "Merged dynamic"), dendroLabels= FALSE, hang=0.03, addGuide= TRUE, guideHang=0.05)

dev.off()

moduleColors= mergedColors
colorOrder= c("turquoise", standardColors(50))
moduleLabels= match(moduleColors, colorOrder)-1
MEs=mergedMEs

save(MEs, moduleLabels, moduleColors, geneTree, file= "Network_signed_0.6.RData")

Figure 2.5 Clustered module eigengenes. Eigengenes are plotted as a dendrogram with the names of the given modules at each branch. The red line is the cut point of 0.4. Modules below it are merged together based on their branching to create larger modules of similar expression patterns.

Relating modules to traits

Now that we have modules of similar gene expression patterns, we want to see how they relate to the different sample traits

For our dataset, we are testing whether gene expression differences between thermal stress treatment are actually do to those treatments or whether other variables like cell density caused the differential expression.

First we load in our expression, trait, and network data from the previous code.

options(stringsAsFactors = FALSE);
lnames = load(file = "Bpsy_Samples_Traits_ALL.RData");
#The variable lnames contains the names of loaded variables.
lnames = load(file = "Network_signed_0.6.RData");
lnames = load(file = "Network_bpsy_nomerge.RData");
lnames
## [1] "dynamicMods" "MEList"      "MEs"         "MEDiss"      "METree"
nGenes = ncol(datExpr0)
nSamples = nrow(datExpr0)
table(moduleColors)
## moduleColors
##       brown        cyan greenyellow     magenta   turquoise 
##        4731         191        3176        2659        5205

After merging, we have five modules with the size of each listed underneath.

Now we recalculate MEs with color labels and plot a heatmap of the module correlations with the different traits.

MEs0 = moduleEigengenes(datExpr0, moduleColors)$eigengenes
MEs = orderMEs(MEs0)
moduleTraitCor = cor(MEs, datTraits, use = "p");
moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples);

textMatrix = paste(signif(moduleTraitCor, 2), "\n(",
                   signif(moduleTraitPvalue, 1), ")", sep = "");
dim(textMatrix) = dim(moduleTraitCor)
par(mar = c(6, 8.5, 3, 3));
labeledHeatmap(Matrix = moduleTraitCor,
               xLabels = names(datTraits),
               yLabels = names(MEs),
               ySymbols = names(MEs),
               colorLabels = FALSE,
               colors = blueWhiteRed(50),
               textMatrix = textMatrix,
               setStdMargins = FALSE,
               cex.text = 0.5,
               zlim = c(-1,1),
               main = paste("Module-trait relationships"))

Figure 2.6 Module-trait relationship heatmap. This plots the relationship between the modules and the sample traits as the y axis has the different modules and the x axis are the different sample traits. Each box displays the correlation value between the module and trait with p-values underneath. -1 indicates that the module always occurs in that trait and its always down-regulated while a correlation of 1 indicates that the module always occurs in that trait and its always up-regulated. Nearly all modules are most correlated with the thermal stress treatment although some modules were also highly correlated with the cell counts and cells per mL.

Analysis of each module

All five modules are analyzed the same way to create heatmaps of their specific expression in each sample and to prepare files to run GO enrichment on each module

Turquoise module

First we make a plot of the gene-trait correlation significance plot to visualize the correlation. The variable weight is defined based on the trait in which the module is most correlated. For the turquoise module, it had a correlation of 1 with the Cool treatment.

weight = as.data.frame(datTraits$Cool); 
names(weight) = "Cool" 
modNames = substring(names(MEs), 3)
geneModuleMembership = as.data.frame(cor(datExpr0, MEs, use = "p"));
MMPvalue = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples));
names(geneModuleMembership) = paste("MM", modNames, sep="");
names(MMPvalue) = paste("p.MM", modNames, sep="");
geneTraitSignificance = as.data.frame(cor(datExpr0, weight, use = "p"));
GSPvalue = as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance), nSamples));
names(geneTraitSignificance) = paste("GS.", names(weight), sep="");
names(GSPvalue) = paste("p.GS.", names(weight), sep="")

module = "turquoise"
column = match(module, modNames);
moduleGenes = moduleColors==module;
par(mfrow = c(1,1));
verboseScatterplot(abs(geneModuleMembership[moduleGenes, column]),
                   abs(geneTraitSignificance[moduleGenes, 1]),
                   xlab = paste("ModMem in", module, "module"),
                   ylab = "Gene Sig for Cool",
                   main = paste("MM vs. GS\n"),
                   cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module)

Figure 2.7 Gene-trait significance correlation plot. This plots the gene significance in the cool trait against module membership in the turquoise module. The turquoise module has a perfect linear correlation with the cool treatment with strong significance.

Now we prepare files to make heatmaps of module expression

First we make VSD files for the module and create a csv file of the rlog data for just our desired module.

vs=t(datExpr0)
cands=names(datExpr0[moduleColors=="turquoise"]) 
c.vsd=vs[rownames(vs) %in% cands,]
nrow(c.vsd) 
## [1] 5205
write.csv(c.vsd,"rlog_MMturquoise.csv",quote=F)

We then use this csv file to prepare objects to plot the heatmap of the module. The vsd object specifies the rlog values of each isogroups in each sample. The allkME object assigned kME values to each isogroup for all the modules.

allkME =as.data.frame(signedKME(t(dat), MEs))
vsd=read.csv(file="rlog_MMturquoise.csv", row.names=1)

Plots to visualize module gene expression in each sample

which.module="turquoise"
ME=MEs[, paste("ME",which.module, sep="")]
genes=datExpr0[,moduleColors==which.module ] 

par(mfrow=c(2,1), mar=c(0.3, 5.5, 5, 2))
plotMat(t(scale(genes) ),nrgcols=30,rlabels=F, clabels=rownames(genes), rcols=which.module)
par(mar=c(5, 4.2, 0, 0.7))
barplot(ME, col=which.module, main="", cex.main=2,
        ylab="eigengene expression",xlab="sample")

Figure 2.8 Eigengene of turquoise module. Eigengenes show overall expression of the module in each sample. In each cool flask, the turquoise module is highly expressed while it is less correlated although down-regulated in the other treatments.

Now we make a heatmap of the expression of the top 100 genes in the turquoise module.

whichModule="turquoise"
top=100

datME=MEs
vsd <- read.csv("Bpsy_wgcna_allgenes.csv", row.names=1)
datExpr=t(vsd)
modcol=paste("kME",whichModule,sep="")
sorted=vsd[order(allkME[,modcol],decreasing=T),]
hubs=sorted[1:top,]

contrasting = colorRampPalette(rev(c("chocolate1","#FEE090","grey10", "cyan3","cyan")))(100)
#quartz()
pheatmap(hubs,scale="row",col=contrasting,border_color=NA, main=paste(whichModule,"top",top,"kME",sep=""))

Figure 2.9 Heatmap of top 100 genes in the turquoise module. There is a clear differences in expression pattern between the cool treatment samples and the other treatments. Nearly all the genes are highly up-regulated in the cool samples while the genes are not differentially expressed or slightly downregulated in the heat and control samples. There are some heat flasks clustered with the controls.

Lastly, we make fisher and kME files to use in GO enrichment analysis

The output shows that our data is correct as it gives the total number of isogroups and the number of isogroups in the turquoise module.

vsd <- read.csv("Bpsy_wgcna_allgenes.csv", row.names=1)
options(stringsAsFactors=FALSE)
data=t(vsd)
allkME =as.data.frame(signedKME(data, MEs))

whichModule="turquoise"

length(moduleColors)
## [1] 15962
inModule=data.frame("module"=rep(0,nrow(vsd)))
row.names(inModule)=row.names(vsd)
genes=row.names(vsd)[moduleColors == whichModule]
inModule[genes,1]=1
sum(inModule[,1])
## [1] 5205
write.csv(inModule,file=paste(whichModule,"turquoise_fisher.csv",sep=""),quote=F)

##KME method. input for delta ranks
modColName=paste("kME",whichModule,sep="")
modkME=as.data.frame(allkME[,modColName])
row.names(modkME)=row.names(allkME)
names(modkME)=modColName
write.csv(modkME,file=paste(whichModule,"turqiouse_kME.csv",sep=""),quote=F)

Greenyellow module

First we make a plot of the gene-trait correlation significance plot to visualize the correlation. The variable weight is defined based on the trait in which the module is most correlated. For the greenyellow module, it had a correlation of -0.99 with the Heat treatment.

weight = as.data.frame(datTraits$Heat); 
names(weight) = "Heat" 
modNames = substring(names(MEs), 3)
geneModuleMembership = as.data.frame(cor(datExpr0, MEs, use = "p"));
MMPvalue = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples));
names(geneModuleMembership) = paste("MM", modNames, sep="");
names(MMPvalue) = paste("p.MM", modNames, sep="");
geneTraitSignificance = as.data.frame(cor(datExpr0, weight, use = "p"));
GSPvalue = as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance), nSamples));
names(geneTraitSignificance) = paste("GS.", names(weight), sep="");
names(GSPvalue) = paste("p.GS.", names(weight), sep="")

module = "greenyellow"
column = match(module, modNames);
moduleGenes = moduleColors==module;
par(mfrow = c(1,1));
verboseScatterplot(abs(geneModuleMembership[moduleGenes, column]),
                   abs(geneTraitSignificance[moduleGenes, 1]),
                   xlab = paste("ModMem in", module, "module"),
                   ylab = "Gene Sig for Heat",
                   main = paste("MM vs. GS\n"),
                   cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module)

Figure 2.10 Gene-trait significance correlation plot. This plots the gene significance in the heat trait against module membership in the greenyellow module. The greenyellow module has a nearly perfect linear correlation with the heat treatment with strong significance.

Now we prepare files to make heatmaps of module expression

First we make VSD files for the module and create a csv file of the rlog data for just our desired module.

vs=t(datExpr0)
cands=names(datExpr0[moduleColors=="greenyellow"]) 
c.vsd=vs[rownames(vs) %in% cands,]
nrow(c.vsd) 
## [1] 3176
write.csv(c.vsd,"rlog_MMgreenyellow.csv",quote=F)

We then use this csv file to prepare objects to plot the heatmap of the module. The vsd object specifies the rlog values of each isogroups in each sample. The allkME object assigned kME values to each isogroup for all the modules.

allkME =as.data.frame(signedKME(t(dat), MEs))
vsd=read.csv(file="rlog_MMgreenyellow.csv", row.names=1)

Plots to visualize module gene expression in each sample

which.module="greenyellow"
ME=MEs[, paste("ME",which.module, sep="")]
genes=datExpr0[,moduleColors==which.module ] 

par(mfrow=c(2,1), mar=c(0.3, 5.5, 5, 2))
plotMat(t(scale(genes) ),nrgcols=30,rlabels=F, clabels=rownames(genes), rcols=which.module)
par(mar=c(5, 4.2, 0, 0.7))
barplot(ME, col=which.module, main="", cex.main=2,
        ylab="eigengene expression",xlab="sample")

Figure 2.11 Eigengene of greenyellow module. Eigengenes show overall expression of the module in each sample. In each heat flask, the greenyellow module is downregulated while it is less correlated although up-regulated in the other treatment samples.

Now we make a heatmap of the expression of the top 100 genes in the greenyellow module.

whichModule="greenyellow"
top=100

datME=MEs
vsd <- read.csv("Bpsy_wgcna_allgenes.csv", row.names=1)
datExpr=t(vsd)
modcol=paste("kME",whichModule,sep="")
sorted=vsd[order(allkME[,modcol],decreasing=T),]
hubs=sorted[1:top,]

contrasting = colorRampPalette(rev(c("chocolate1","#FEE090","grey10", "cyan3","cyan")))(100)
#quartz()
pheatmap(hubs,scale="row",col=contrasting,border_color=NA, main=paste(whichModule,"top",top,"kME",sep=""))

Figure 2.12 Heatmap of top 100 genes in the greenyellow module. There is a clear differences in expression pattern between the heat treatment samples and the other treatments. Nearly all the genes are highly down-regulated in the heat samples while the genes are mainly up-regulated or not differentially expressed in in the cool and control samples.

Lastly, we make fisher and kME files to use in GO enrichment analysis

The output shows that our data is correct as it gives the total number of isogroups and the number of isogroups in the turquoise module.

vsd <- read.csv("Bpsy_wgcna_allgenes.csv", row.names=1)
options(stringsAsFactors=FALSE)
data=t(vsd)
allkME =as.data.frame(signedKME(data, MEs))

whichModule="greenyellow"

length(moduleColors)
## [1] 15962
inModule=data.frame("module"=rep(0,nrow(vsd)))
row.names(inModule)=row.names(vsd)
genes=row.names(vsd)[moduleColors == whichModule]
inModule[genes,1]=1
sum(inModule[,1])
## [1] 3176
write.csv(inModule,file=paste(whichModule,"greenyellow_fisher.csv",sep=""),quote=F)

##KME method. input for delta ranks
modColName=paste("kME",whichModule,sep="")
modkME=as.data.frame(allkME[,modColName])
row.names(modkME)=row.names(allkME)
names(modkME)=modColName
write.csv(modkME,file=paste(whichModule,"greenyellow_kME.csv",sep=""),quote=F)

Brown module

First we make a plot of the gene-trait correlation significance plot to visualize the correlation. The variable weight is defined based on the trait in which the module is most correlated. For the brown module, it had a correlation of 0.92 with the Heat treatment. This module also has a correlation of 0.97 with the average cell count and cell per mL trait likely due to higher cell counts being correlated with heat treatment.

weight = as.data.frame(datTraits$Heat); 
names(weight) = "Heat" 
modNames = substring(names(MEs), 3)
geneModuleMembership = as.data.frame(cor(datExpr0, MEs, use = "p"));
MMPvalue = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples));
names(geneModuleMembership) = paste("MM", modNames, sep="");
names(MMPvalue) = paste("p.MM", modNames, sep="");
geneTraitSignificance = as.data.frame(cor(datExpr0, weight, use = "p"));
GSPvalue = as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance), nSamples));
names(geneTraitSignificance) = paste("GS.", names(weight), sep="");
names(GSPvalue) = paste("p.GS.", names(weight), sep="")

module = "brown"
column = match(module, modNames);
moduleGenes = moduleColors==module;
par(mfrow = c(1,1));
verboseScatterplot(abs(geneModuleMembership[moduleGenes, column]),
                   abs(geneTraitSignificance[moduleGenes, 1]),
                   xlab = paste("ModMem in", module, "module"),
                   ylab = "Gene Sig for Heat",
                   main = paste("MM vs. GS\n"),
                   cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module)

Figure 2.13 Gene-trait significance correlation plot. This plots the gene significance in the heat trait against module membership in the brown module. The brown module has a strong correlation with the heat treatment with strong significance. It is not as well correlated as the previous modules but still significant.

Now we prepare files to make heatmaps of module expression

First we make VSD files for the module and create a csv file of the rlog data for just our desired module.

vs=t(datExpr0)
cands=names(datExpr0[moduleColors=="brown"]) 
c.vsd=vs[rownames(vs) %in% cands,]
nrow(c.vsd) 
## [1] 4731
write.csv(c.vsd,"rlog_MMbrown.csv",quote=F)

We then use this csv file to prepare objects to plot the heatmap of the module. The vsd object specifies the rlog values of each isogroups in each sample. The allkME object assigned kME values to each isogroup for all the modules.

allkME =as.data.frame(signedKME(t(dat), MEs))
vsd=read.csv(file="rlog_MMbrown.csv", row.names=1)

Plots to visualize module gene expression in each sample

which.module="brown"
ME=MEs[, paste("ME",which.module, sep="")]
genes=datExpr0[,moduleColors==which.module ] 

par(mfrow=c(2,1), mar=c(0.3, 5.5, 5, 2))
plotMat(t(scale(genes) ),nrgcols=30,rlabels=F, clabels=rownames(genes), rcols=which.module)
par(mar=c(5, 4.2, 0, 0.7))
barplot(ME, col=which.module, main="", cex.main=2,
        ylab="eigengene expression",xlab="sample")

Figure 2.14 Eigengene of brown module. Eigengenes show overall expression of the module in each sample. In each heat flask, the brown module is up-regulated although expression is not even across the different heat samples. It is also shows a high correlation, although lower correlation than with the heat trait, in the cool samples but it is down-regulated.

Now we make a heatmap of the expression of the top 100 genes in the brown module.

whichModule="brown"
top=100

datME=MEs
vsd <- read.csv("Bpsy_wgcna_allgenes.csv", row.names=1)
datExpr=t(vsd)
modcol=paste("kME",whichModule,sep="")
sorted=vsd[order(allkME[,modcol],decreasing=T),]
hubs=sorted[1:top,]

contrasting = colorRampPalette(rev(c("chocolate1","#FEE090","grey10", "cyan3","cyan")))(100)

pheatmap(hubs,scale="row",col=contrasting,border_color=NA, main=paste(whichModule,"top",top,"kME",sep=""))

Figure 2.15 Heatmap of top 100 genes in the brown module. There is a clear differences in expression pattern between all three treatments. Nearly all the genes are highly up-regulated in the heat samples while the genes nearly all highly downregulated in the cool samples. This that different kinds of thermal stress may have opposite effects on the same pathways.

Lastly, we make fisher and kME files to use in GO enrichment analysis

The output shows that our data is correct as it gives the total number of isogroups and the number of isogroups in the brown module.

vsd <- read.csv("Bpsy_wgcna_allgenes.csv", row.names=1)
options(stringsAsFactors=FALSE)
data=t(vsd)
allkME =as.data.frame(signedKME(data, MEs))

whichModule="brown"

length(moduleColors)
## [1] 15962
inModule=data.frame("module"=rep(0,nrow(vsd)))
row.names(inModule)=row.names(vsd)
genes=row.names(vsd)[moduleColors == whichModule]
inModule[genes,1]=1
sum(inModule[,1])
## [1] 4731
write.csv(inModule,file=paste(whichModule,"brown_fisher.csv",sep=""),quote=F)

##KME method. input for delta ranks
modColName=paste("kME",whichModule,sep="")
modkME=as.data.frame(allkME[,modColName])
row.names(modkME)=row.names(allkME)
names(modkME)=modColName
write.csv(modkME,file=paste(whichModule,"brown_kME.csv",sep=""),quote=F)

Magenta module

First we make a plot of the gene-trait correlation significance plot to visualize the correlation. The variable weight is defined based on the trait in which the module is most correlated. For the magenta module, it had a correlation of -0.9 with the Cool treatment. It also has a correlaton of 0.72 with the Control treatment.

weight = as.data.frame(datTraits$Cool); 
names(weight) = "Cool" 
modNames = substring(names(MEs), 3)
geneModuleMembership = as.data.frame(cor(datExpr0, MEs, use = "p"));
MMPvalue = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples));
names(geneModuleMembership) = paste("MM", modNames, sep="");
names(MMPvalue) = paste("p.MM", modNames, sep="");
geneTraitSignificance = as.data.frame(cor(datExpr0, weight, use = "p"));
GSPvalue = as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance), nSamples));
names(geneTraitSignificance) = paste("GS.", names(weight), sep="");
names(GSPvalue) = paste("p.GS.", names(weight), sep="")

module = "magenta"
column = match(module, modNames);
moduleGenes = moduleColors==module;
par(mfrow = c(1,1));
verboseScatterplot(abs(geneModuleMembership[moduleGenes, column]),
                   abs(geneTraitSignificance[moduleGenes, 1]),
                   xlab = paste("ModMem in", module, "module"),
                   ylab = "Gene Sig for Magenta",
                   main = paste("MM vs. GS\n"),
                   cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module)

Figure 2.16 Gene-trait significance correlation plot. This plots the gene significance in the cool trait against module membership in the magenta module. The magenta module has a strong linear correlation with the cool treatment with strong significance.

Now we prepare files to make heatmaps of module expression

First we make VSD files for the module and create a csv file of the rlog data for just our desired module.

vs=t(datExpr0)
cands=names(datExpr0[moduleColors=="magenta"]) 
c.vsd=vs[rownames(vs) %in% cands,]
nrow(c.vsd) 
## [1] 2659
write.csv(c.vsd,"rlog_MMmagenta.csv",quote=F)

We then use this csv file to prepare objects to plot the heatmap of the module. The vsd object specifies the rlog values of each isogroups in each sample. The allkME object assigned kME values to each isogroup for all the modules.

allkME =as.data.frame(signedKME(t(dat), MEs))
vsd=read.csv(file="rlog_MMmagenta.csv", row.names=1)

Plots to visualize module gene expression in each sample

which.module="magenta"
ME=MEs[, paste("ME",which.module, sep="")]
genes=datExpr0[,moduleColors==which.module ] 

par(mfrow=c(2,1), mar=c(0.3, 5.5, 5, 2))
plotMat(t(scale(genes) ),nrgcols=30,rlabels=F, clabels=rownames(genes), rcols=which.module)
par(mar=c(5, 4.2, 0, 0.7))
barplot(ME, col=which.module, main="", cex.main=2,
        ylab="eigengene expression",xlab="sample")

Figure 2.17 Eigengene of magenta module. Eigengenes show overall expression of the module in each sample. In each cool flask, the magenta module is highly down-regulated while it is less correlated although up-regulated in the control treatment. It shows little expression in the heat samples except for some expression in the heat_1.1 sample.

Now we make a heatmap of the expression of the top 100 genes in the magenta module.

whichModule="magenta"
top=100

datME=MEs
vsd <- read.csv("Bpsy_wgcna_allgenes.csv", row.names=1)
datExpr=t(vsd)
modcol=paste("kME",whichModule,sep="")
sorted=vsd[order(allkME[,modcol],decreasing=T),]
hubs=sorted[1:top,]

contrasting = colorRampPalette(rev(c("chocolate1","#FEE090","grey10", "cyan3","cyan")))(100)

pheatmap(hubs,scale="row",col=contrasting,border_color=NA, main=paste(whichModule,"top",top,"kME",sep=""))

Figure 2.18 Heatmap of top 100 genes in the magenta module. There is a clear differences in expression pattern between the cool treatment samples and the other treatments. Nearly all the genes are highly down-regulated in the cool samples while the genes are mainly not differentially expressed or slightly down-regulated or slightly up-regulated in the heat samples. The control samples show mainly up-regulation although some samples show more no differential expression. Unexpectedly, heat_1.1 is clustered with the control samples and not the other heat samples.

Lastly, we make fisher and kME files to use in GO enrichment analysis

The output shows that our data is correct as it gives the total number of isogroups and the number of isogroups in the turquoise module.

vsd <- read.csv("Bpsy_wgcna_allgenes.csv", row.names=1)
options(stringsAsFactors=FALSE)
data=t(vsd)
allkME =as.data.frame(signedKME(data, MEs))

whichModule="magenta"

length(moduleColors)
## [1] 15962
inModule=data.frame("module"=rep(0,nrow(vsd)))
row.names(inModule)=row.names(vsd)
genes=row.names(vsd)[moduleColors == whichModule]
inModule[genes,1]=1
sum(inModule[,1])
## [1] 2659
write.csv(inModule,file=paste(whichModule,"magenta_fisher.csv",sep=""),quote=F)

##KME method. input for delta ranks
modColName=paste("kME",whichModule,sep="")
modkME=as.data.frame(allkME[,modColName])
row.names(modkME)=row.names(allkME)
names(modkME)=modColName
write.csv(modkME,file=paste(whichModule,"magenta_kME.csv",sep=""),quote=F)

Conclusions

Analyzing gene expression in terms of different traits grouped our samples mainly based on treatment rather than the other trait variables.Merging our gene expression data in terms of similar patterns of expression lead us to have 5 modules. Of those 5, 4 were well correlated with different trait aspects of the experiment. The cyan module showed no strong correlation with any trait so it was not included in subsequent further analysis. The strongest correlations between modules and traits were found with the treatment variables. Although, there were also some strong correlations with average cell counts and average cells per mL. These correlations followed the same direction and similar strength as the heat treatment which is due to higher cell counts being correlated with exposure to higher temperatures. In particular, the brown module cannot be distinguished whether it is a function of high heat or high cell counts as the correlations are too similar. GO enrichment of this module may shed light on this distinction based on the terms in the module.